phq-8 score
Probabilistic Textual Time Series Depression Detection
Schmidt, Fabian, Ravan, Seyedehmoniba, Vlassov, Vladimir
Accurate and interpretable predictions of depression severity are essential for clinical decision support, yet existing models often lack uncertainty estimates and temporal modeling. We propose PTTSD, a Probabilistic Textual Time Series Depression Detection framework that predicts PHQ-8 scores from utterance-level clinical interviews while modeling uncertainty over time. PTTSD includes sequence-to-sequence and sequence-to-one variants, both combining bidirectional LSTMs, self-attention, and residual connections with Gaussian or Student-t output heads trained via negative log-likelihood. Evaluated on E-DAIC and DAIC-WOZ, PTTSD achieves state-of-the-art performance among text-only systems (e.g., MAE = 3.85 on E-DAIC, 3.55 on DAIC) and produces well-calibrated prediction intervals. Ablations confirm the value of attention and probabilistic modeling, while comparisons with MentalBERT establish generality. A three-part calibration analysis and qualitative case studies further highlight the interpretability and clinical relevance of uncertainty-aware forecasting.
Cross-Demographic Portability of Deep NLP-Based Depression Models
Rutowski, Tomek, Shriberg, Elizabeth, Harati, Amir, Lu, Yang, Oliveira, Ricardo, Chlebek, Piotr
Deep learning models are rapidly gaining interest for real-world applications in behavioral health. An important gap in current literature is how well such models generalize over different populations. We study Natural Language Processing (NLP) based models to explore portability over two different corpora highly mismatched in age. The first and larger corpus contains younger speakers. It is used to train an NLP model to predict depression. When testing on unseen speakers from the same age distribution, this model performs at AUC=0.82. We then test this model on the second corpus, which comprises seniors from a retirement community. Despite the large demographic differences in the two corpora, we saw only modest degradation in performance for the senior-corpus data, achieving AUC=0.76. Interestingly, in the senior population, we find AUC=0.81 for the subset of patients whose health state is consistent over time. Implications for demographic portability of speech-based applications are discussed.
Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches
Feng, Kexin, Chaspari, Theodora
This study investigates explainable machine learning algorithms for identifying depression from speech. Grounded in evidence from speech production that depression affects motor control and vowel generation, pre-trained vowel-based embeddings, that integrate semantically meaningful linguistic units, are used. Following that, an ensemble learning approach decomposes the problem into constituent parts characterized by specific depression symptoms and severity levels. Two methods are explored: a "bottom-up" approach with 8 models predicting individual Patient Health Questionnaire-8 (PHQ-8) item scores, and a "top-down" approach using a Mixture of Experts (MoE) with a router module for assessing depression severity. Both methods depict performance comparable to state-of-the-art baselines, demonstrating robustness and reduced susceptibility to dataset mean/median values. System explainability benefits are discussed highlighting their potential to assist clinicians in depression diagnosis and screening.
Enhancing Depression Diagnosis with Chain-of-Thought Prompting
Shi, Elysia, Manda, Adithri, Chowdhury, London, Arun, Runeema, Zhu, Kevin, Lam, Michael
When using AI to detect signs of depressive disorder, AI models habitually draw preemptive conclusions. We theorize that using chain-of-thought (CoT) prompting to evaluate Patient Health Questionnaire-8 (PHQ-8) scores will improve the accuracy of the scores determined by AI models. In our findings, when the models reasoned with CoT, the estimated PHQ-8 scores were consistently closer on average to the accepted true scores reported by each participant compared to when not using CoT. Our goal is to expand upon AI models' understanding of the intricacies of human conversation, allowing them to more effectively assess a patient's feelings and tone, therefore being able to more accurately discern mental disorder symptoms; ultimately, we hope to augment AI models' abilities, so that they can be widely accessible and used in the medical field.
Advancing Mental Health Pre-Screening: A New Custom GPT for Psychological Distress Assessment
This study introduces 'Psycho Analyst', a custom GPT model based on OpenAI's GPT-4, optimized for pre-screening mental health disorders. Enhanced with DSM-5, PHQ-8, detailed data descriptions, and extensive training data, the model adeptly decodes nuanced linguistic indicators of mental health disorders. It utilizes a dual-task framework that includes binary classification and a three-stage PHQ-8 score computation involving initial assessment, detailed breakdown, and independent assessment, showcasing refined analytic capabilities. Validation with the DAIC-WOZ dataset reveals F1 and Macro-F1 scores of 0.929 and 0.949, respectively, along with the lowest MAE and RMSE of 2.89 and 3.69 in PHQ-8 scoring. These results highlight the model's precision and transformative potential in enhancing public mental health support, improving accessibility, cost-effectiveness, and serving as a second opinion for professionals.
Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities
Anand, Avinash, Tank, Chayan, Pol, Sarthak, Katoch, Vinayak, Mehta, Shaina, Shah, Rajiv Ratn
Depression has proven to be a significant public health issue, profoundly affecting the psychological well-being of individuals. If it remains undiagnosed, depression can lead to severe health issues, which can manifest physically and even lead to suicide. Generally, Diagnosing depression or any other mental disorder involves conducting semi-structured interviews alongside supplementary questionnaires, including variants of the Patient Health Questionnaire (PHQ) by Clinicians and mental health professionals. This approach places significant reliance on the experience and judgment of trained physicians, making the diagnosis susceptible to personal biases. Given that the underlying mechanisms causing depression are still being actively researched, physicians often face challenges in diagnosing and treating the condition, particularly in its early stages of clinical presentation. Recently, significant strides have been made in Artificial neural computing to solve problems involving text, image, and speech in various domains. Our analysis has aimed to leverage these state-of-the-art (SOTA) models in our experiments to achieve optimal outcomes leveraging multiple modalities. The experiments were performed on the Extended Distress Analysis Interview Corpus Wizard of Oz dataset (E-DAIC) corpus presented in the Audio/Visual Emotion Challenge (AVEC) 2019 Challenge. The proposed solutions demonstrate better results achieved by Proprietary and Open-source Large Language Models (LLMs), which achieved a Root Mean Square Error (RMSE) score of 3.98 on Textual Modality, beating the AVEC 2019 challenge baseline results and current SOTA regression analysis architectures. Additionally, the proposed solution achieved an accuracy of 71.43% in the classification task. The paper also includes a novel audio-visual multi-modal network that predicts PHQ-8 scores with an RMSE of 6.51.
Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model
Zhang, Yuezhou, Folarin, Amos A, Dineley, Judith, Conde, Pauline, de Angel, Valeria, Sun, Shaoxiong, Ranjan, Yatharth, Rashid, Zulqarnain, Stewart, Callum, Laiou, Petroula, Sankesara, Heet, Qian, Linglong, Matcham, Faith, White, Katie M, Oetzmann, Carolin, Lamers, Femke, Siddi, Sara, Simblett, Sara, Schuller, Bjรถrn W., Vairavan, Srinivasan, Wykes, Til, Haro, Josep Maria, Penninx, Brenda WJH, Narayan, Vaibhav A, Hotopf, Matthew, Dobson, Richard JB, Cummins, Nicholas, consortium, RADAR-CNS
Language use has been shown to correlate with depression, but large-scale validation is needed. Traditional methods like clinic studies are expensive. So, natural language processing has been employed on social media to predict depression, but limitations remain-lack of validated labels, biased user samples, and no context. Our study identified 29 topics in 3919 smartphone-collected speech recordings from 265 participants using the Whisper tool and BERTopic model. Six topics with a median PHQ-8 greater than or equal to 10 were regarded as risk topics for depression: No Expectations, Sleep, Mental Therapy, Haircut, Studying, and Coursework. To elucidate the topic emergence and associations with depression, we compared behavioral (from wearables) and linguistic characteristics across identified topics. The correlation between topic shifts and changes in depression severity over time was also investigated, indicating the importance of longitudinally monitoring language use. We also tested the BERTopic model on a similar smaller dataset (356 speech recordings from 57 participants), obtaining some consistent results. In summary, our findings demonstrate specific speech topics may indicate depression severity. The presented data-driven workflow provides a practical approach to collecting and analyzing large-scale speech data from real-world settings for digital health research.
Multi-modal Depression Estimation based on Sub-attentional Fusion
Wei, Ping-Cheng, Peng, Kunyu, Roitberg, Alina, Yang, Kailun, Zhang, Jiaming, Stiefelhagen, Rainer
Failure to timely diagnose and effectively treat depression leads to over 280 million people suffering from this psychological disorder worldwide. The information cues of depression can be harvested from diverse heterogeneous resources, e.g., audio, visual, and textual data, raising demand for new effective multi-modal fusion approaches for automatic estimation. In this work, we tackle the task of automatically identifying depression from multi-modal data and introduce a sub-attention mechanism for linking heterogeneous information while leveraging Convolutional Bidirectional LSTM as our backbone. To validate this idea, we conduct extensive experiments on the public DAIC-WOZ benchmark for depression assessment featuring different evaluation modes and taking gender-specific biases into account. The proposed model yields effective results with 0.89 precision and 0.70 F1-score in detecting major depression and 4.92 MAE in estimating the severity. Our attention-based fusion module consistently outperforms conventional late fusion approaches and achieves competitive performance compared to the previously published depression estimation frameworks, while learning to diagnose the disorder end-to-end and relying on far fewer preprocessing steps.
Predicting Depressive Symptom Severity through Individuals' Nearby Bluetooth Devices Count Data Collected by Mobile Phones: A Preliminary Longitudinal Study
Zhang, Yuezhou, Folarin, Amos A, Sun, Shaoxiong, Cummins, Nicholas, Ranjan, Yatharth, Rashid, Zulqarnain, Conde, Pauline, Stewart, Callum, Laiou, Petroula, Matcham, Faith, Oetzmann, Carolin, Lamers, Femke, Siddi, Sara, Simblett, Sara, Rintala, Aki, Mohr, David C, Myin-Germeys, Inez, Wykes, Til, Haro, Josep Maria, Pennix, Brenda WJH, Narayan, Vaibhav A, Annas, Peter, Hotopf, Matthew, Dobson, Richard JB
The Bluetooth sensor embedded in mobile phones provides an unobtrusive, continuous, and cost-efficient means to capture individuals' proximity information, such as the nearby Bluetooth devices count (NBDC). The continuous NBDC data can partially reflect individuals' behaviors and status, such as social connections and interactions, working status, mobility, and social isolation and loneliness, which were found to be significantly associated with depression by previous survey-based studies. This paper aims to explore the NBDC data's value in predicting depressive symptom severity as measured via the 8-item Patient Health Questionnaire (PHQ-8). The data used in this paper included 2,886 bi-weekly PHQ-8 records collected from 316 participants recruited from three study sites in the Netherlands, Spain, and the UK as part of the EU RADAR-CNS study. From the NBDC data two weeks prior to each PHQ-8 score, we extracted 49 Bluetooth features, including statistical features and nonlinear features for measuring periodicity and regularity of individuals' life rhythms. Linear mixed-effect models were used to explore associations between Bluetooth features and the PHQ-8 score. We then applied hierarchical Bayesian linear regression models to predict the PHQ-8 score from the extracted Bluetooth features. A number of significant associations were found between Bluetooth features and depressive symptom severity. Compared with commonly used machine learning models, the proposed hierarchical Bayesian linear regression model achieved the best prediction metrics, R2= 0.526, and root mean squared error (RMSE) of 3.891. Bluetooth features can explain an extra 18.8% of the variance in the PHQ-8 score relative to the baseline model without Bluetooth features (R2=0.338, RMSE = 4.547).
MFCC-based Recurrent Neural Network for Automatic Clinical Depression Recognition and Assessment from Speech
Rejaibi, Emna, Komaty, Ali, Meriaudeau, Fabrice, Agrebi, Said, Othmani, Alice
MFCC-based Recurrent Neural Network for Automatic Clinical Depression Recognition and Assessment from Speech Emna Rejaibi a,b,c, Ali Komaty d, Fabrice Meriaudeau e, Said Agrebi c, Alice Othmani a a Universit e Paris-Est, LISSI, UPEC, 94400 Vitry sur Seine, France b INSAT Institut National des Sciences Appliqu ees et de T echnologie, Centre Urbain Nord BP 676-1080, Tunis, Tunisie c Y obitrust, T echnopark El Gazala B11 Route de Raoued Km 3.5, 2088 Ariana, Tunisie d University of Sciences and Arts in Lebanon, Ghobeiry, Liban e Universit e de Bourgogne Franche Comt e, ImvIA EA7535/ IFTIM Abstract Major depression, also known as clinical depression, is a constant sense of despair and hopelessness. It is a major mental disorder that can a ff ect people of any age including children and that a ff ect negatively person's personal life, work life, social life and health conditions. Globally, over 300 million people of all ages are estimated to su ff er from clinical depression. A deep recurrent neural network-based framework is presented in this paper to detect depression and to predict its severity level from speech. Low-level and high-level audio features are extracted from audio recordings to predict the 24 scores of the Patient Health Questionnaire (a depression assessment test) and the binary class of depression diagnosis. To overcome the problem of the small size of Speech Depression Recognition (SDR) datasets, data augmentation techniques are used to expand the labeled training set and also transfer learning is performed where the proposed model is trained on a related task and reused as starting point for the proposed model on SDR task. The proposed framework is evaluated on the DAIC-WOZ corpus of the A VEC2017 challenge and promising results are obtained. An overall accuracy of 76.27% with a root mean square error of 0.4 is achieved in assessing depression, while a root mean square error of 0.168 is achieved in predicting the depression severity levels. Introduction Depression is a mental disorder caused by several factors: psychological, social or even physical factors. Psychological factors are related to permanent stress and the inability to successfully cope with di fficult situations. Social factors concern relationship struggles with family or friends and physical factors cover head injuries. Depression describes a loss of interest in every exciting and joyful aspect of everyday life. Mood disorders and mood swings are temporary mental states taking an essential part of daily events, whereas, depression is more permanent and can lead to suicide at its extreme severity levels.